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Bounds on the Parameters of Locally Recoverable 

Codes 

Itzhak Tamo* Alexander Bargl Alexey Frolov^ 


Abstract —A locally recoverable code (LRC code) is a code 
over a finite alphabet such that every symbol in the encoding 
is a function of a small number of other symbols that form a 
recovering set. In this paper we derive new finite-length and 
asymptotic bounds on the parameters of LRC codes. For LRC 
codes with a single recovering set for every coordinate, we derive 
an asymptotic Gilbert-Varshamov type bound for LRC codes and 
find the maximum attainable relative distance of asymptotically 
good LRC codes. Similar results are established for LRC codes 
with two disjoint recovering sets for every coordinate. For the 
case of multiple recovering sets (the availability problem) we 
derive a lower bound on the parameters using expander graph 
arguments. Finally, we also derive finite-length upper bounds on 
the rate and distance of LRC codes with multiple recovering sets. 

Index Terms —Availability problem, asymptotic bounds, 
Gilbert-Varshamov bound, graph expansion, recovery graph 


I. Introduction 

Locally recoverable (LRC) codes currently form one of the 
rapidly developing topics in coding theory because of their 
applications in distributed and cloud storage systems. Recently 
LRC codes have been the subject of numerous publications, 
among them Cl, (H, ||25l, III, 130), IMl, HH. Let Q 
be a 5-ary alphabet. We say that a code C C Q" has locality 
r if every symbol of the codeword x G C can be recovered 
from a subset of r other symbols of x (i.e., is a function of 
some other r symbols Xi^jXi^,..., Xi, j ca. In other words, 
this means that, given x G C,i G [n], there exists a subset of 
coordinates TZi C [n]\i, \TZi\ < r such that the restriction of 
C to the coordinates in TZi enables one to find the value of 
Xi- The subset TZi is called a recovering set for the symbol 
Xi- Generalizing this concept, assume that every symbol of the 
code C can be recovered from t disjoint subsets of symbols 
of size ri,... ,rt respectively. Below we restrict ourselves to 
the case ri = ■ ■ ■ — rt = r which makes the bounds obtained 
in the paper more compact. At the same time, we note that 
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the technique presented below enables us to treat the general 
case as well. 

Given a code C C of size 5^ with t disjoint recovering 
sets of size r, we use the notation (n, k, r, t) to refer to its 
parameters. If the values of n, k, r are understood, we simply 
call C a f-LRC code. 

More formally, denote by Cj the restriction of the code C 
to a subset of coordinates I C [n]. Given a G Q define the set 
of codewords C{i, a) = {x G C : Xi = a}, i G [n]. 

Definition: A code C is said to have t disjoint recovering 
sets if for every i G [n] there are t pairwise disjoint subsets 
TZj,... ,TZ\ C [n]\i such that for all j = 1,... ,f and every 
pair of symbols a,a' G Q,a ^ a' 

nC{i,a')^j = 9 . (1) 

Having more than one recovering set is beneficial in practice 
because it enables more users to access a given portion of data, 
thus enhancing data availability in the system. 

In this paper we study upper and lower bounds on the 
parameters of t-LRC codes. Most of our results concern 
bounds on the attainable value of the minimum distance d 
of a code C given its parameters {n,k,r,t). Since the main 
goal of LRC codes is to recover from one erased coordinate 
using its recovering set, it is not clear why one is interested 
in large values of the minimum distance. It is possible that 
more than one storage nodes have failed, necessitating higher 
separation of the codewords, but the probability of this event 
under the normal functioning of the system is low. To justify 
this problem from the perspective of applications, consider the 
situation when a cluster of nodes becomes inoperable due to 
either power failure or maintenance. In this case it is desirable 
to be able to switch from local to global decoding, and this is 
where large distance of the code becomes a useful feature. 

A note on terminology: when we speak of lower bounds, 
our goal is to show that there exist codes, or sequences of 
codes, that attain a particular relation between the parameters 
(e.g., have large distance). In the case of upper bounds we 
aim to show that no code with given locality properties can 
have distance or rate greater than some function of the other 
parameters of the code. In the proof of the upper bounds we 
do not make any assumptions on the alphabet Q, while the 
lower bounds are proved using linear codes over finite fields. 

We note that the case of f = 1 is by far the easiest because 
good LRC codes with high distance are well structured. 
However even in this case lower bounds were largely absent 
from the literature. Namely, in the classic case, the asymptotic 
bounds for error-correcting codes pinpoint the value of the 
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relative distance i5o = (<? — l)/g such that there exist asymp¬ 
totically good codes for all smaller <5, and there are no code 
sequences with positive rate for i5 > i5o. In this paper we 
remedy this situation by deriving a Gilbert-Varshamov (GV) 
type bound that implies the same conclusion for any constant 
value of r (concurrent with our work, this question was also 
resolved in HI). 

For codes with multiple recovering sets deriving bounds 
on the parameters is more involved because of the mutual 
interaction between the sets that is difficult to quantify. For this 
problem we obtain the following results. First, we derive an 
upper bound on the maximum attainable rate of a <-LRC code 
expressed in terms of r and t. We also derive an upper bound 
on the minimum distance of f-LRC codes given the cardinality 
of the code and the value of the locality parameter. Turning to 
lower bounds, we derive an asymptotic GV-type bound on the 
parameters of codes with t = 2 disjoint recovering sets. This 
result again enables us to conclude that asymptotically good 
binary 2-LRC codes exist only if the relative distance (5 < ^o- 

We also note that there is an obvious connection between 2- 
LRC codes and low-density parity-check codes whose graphs 
do not have cycles of length 4. While we employ some ideas 
from LDPC codes for the derivation of GV-type bounds, direct 
application of bounds on LDPC codes does not lead to good 
results for the LRC problem. 

Existence of f-LRC codes with arbitrary t and r seems to 
be a difficult problem. We observe that there is a connection 
between local recovery and expansion properties of some 
graph related to the code. The best known expanders are 
constructed using the probabilistic method. Using them, we 
are able to show that there exist asymptotically good g-ary 
f-LRC codes for any t and r over alphabets of large size q. 

The version of LRC codes considered above assumes that 
every coordinate of the code can be recovered from a few 
other coordinates. A less restricted version of this definition 
requires that this property applies only to information symbols 
of the codeword. Accordingly, the two versions of LRC codes 
are called codes with all-symbol locality and codes with 
information locality. In this paper we consider only the first of 
these possibilities. Codes with multiple disjoint repair groups 
under the information locality assumption were considered in 

USD, Ea. 

Finally, we mention some other extensions and generaliza¬ 
tions of the locality problem. In ll2Tll . the notion of codes 
with locality was generalized to codes that enable cooperative 
recovery from multiple erasures. In particular, this paper 
studied LRC codes that support recovery of any I failed codes 
symbols by reading at most r other code symbols. A related 
paper am studied codes that enable successive local recovery 
of two erasures performed using two recovering sets one after 
the other. 


11. An overview of bounds for LRC codes 

Below we give a brief overview of the known bounds on 
LRC codes with all-symbol locality. 


A. Known results, Single recovering set 

Clearly, any upper bound on the cardinality of a code 
with a given distance applies to LRC codes as well. We 
are interested in bounds that in addition take account of the 
locality constraint. 

Let C be an (n, k, r) LRC code. The rate of C satisfies 


fc r 

- < -• 

n r -f 1 

The minimum distance of C satisfies 


d < n — k — 


k 

r 


+ 2 . 


( 2 ) 

(3) 


These upper bounds on the distance and rate of LRC codes 
were proved in m, M- The bound 0 forms a gener¬ 
alization of the classical Singleton bound in coding theory 
ca, and reduces to it for the maximum value of locality 
r = k. Recently codes that generalize Reed-Solomon codes 
and achieve the bound (1^) for small code alphabets and any 
n a multiple of r -f 1 were constructed in 1281 . 

The bound (1^) does not account for the size of the code 
alphabet q. A shortening bound on the distance that depends 
on q was derived in il. To introduce it, denote by Mq{n,d) 
the maximum cardinality of a code in the g-ary Hamming 
space with distance d and let kq{n,d) := logq Mq{n,d). For 
any g-ary LRC code with the parameters (n, k, r) and distance 
d, 


k< min {sr + kq(n — s(r + l),d)}. (4) 

Turning to asymptotic bounds, let us introduce the notation 
Rq (r, 6) = lim sup — log^ Mq (n, r, 5n) (5) 

n—^oo ^ 

where Mq{n,r,d) is the maximum cardinality of the code of 
length n, distance d, and locality r. 

Proposition 2.1: (0) The following asymptotic bounds on 
the rate of g-ary codes with a single recovering set and locality 
r hold true: 

i?g(r,<5) < ^(1-5), 0<5<1 (6) 

r + 1 

Rq{r,S) < 0<S<q/{q-l) (7) 

r-flV g—1/ 

Rq{r,S) < min {rr + {1 - T{r + l))fq(- - f , U’ ) I 

0<T<^1. ^ VI - T(r-I-1)/J 

( 8 ) 

where 

fq{x) := hq{^{q -l-x{q-2)- 2s/{q - l)x(l - x))), 
hq{x) := -x\ogq{x/{q- 1)) - (1 - a;)logq(l - x). 

Bounds 0, 0, 0 follow on substituting into 0 classical 
upper bounds on M{n,d). Namely 0 and 0 are obtained 
using the Singleton and Plotkin bounds, respectively, ca, 
while 0 follows on substituting the linear programming 
bound for g-ary codes H] (bound 0 can be also obtained 
by passing to the limit n —> c» in 0). For small values of 5 
a bound slightly better than 0 can be obtained by using in 
0 a better linear programming bound from 0 . 
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In the binary case, the bounds (|^-(|^ are shown in Fig.[TJa) 
below together with a GV-type bound (1^ derived in this 


paper; see Theorem B and Theorem 5.1 It is evident from 
the plot that bound ([^ changes its behavior for large values 
of 6. The reason for this is that when r —> 0, the right-hand 
side of ([^ approaches the value fq{5). At this point in the 
plot we switch to the classical (i.e., locality-unaware) linear 
programming bound on the rate of codes Rq{S) < fq{S). 

The following proposition follows by concatenating several 
copies of a single parity check code. 

Proposition 2.2: Rq{r,0) > r/(r -f 1). 

Other upper bounds on the distance of LRC codes appear 
in im, m\- In particular, ll^ gives an integer-programming 
based bound on the distance of LRC codes. The result of 
this paper is not expressed in a closed form, but is shown 
to improve the known bounds in many examples. In the case 
of linear cyclic LRC codes there is an obvious link between 
locality and the dual distance of the code, which enables one 
to use linear programming bounds on the code parameters. 
More details about this are given in ll29ll . 

As for lower bounds, the following results appear in the 
literature. By a straightforward adaptation of the GV argument 
il, 01 one obtains the following proposition. 

Proposition 2.3: A linear (n, k, r) LRC code with distance 
d exists if 


. ^Vq-1)* (9) 

i=0 ^ / 

Below in ( [T9] l we establish a more accurate version of the GV 
bound relying on the ideas from bipartite-graph and LDPC 
codes. 

Constructions of LRC codes on algebraic curves were re¬ 
cently proposed in 0). Using asymptotically maximal curves, 
it is possible to construct sequences of LRC codes that improve 
the GV-type bound ( [T9| l. The cardinality of the alphabet q for 
which the improvement takes place depends on the value of 
locality r. For instance, for r = 2 it is possible to construct 
codes that asymptotically exceed the GV bound ( [T9| ) for 
alphabets of size q > 289, and for r = 3 for g > 361. 

A more general version of LRC codes was introduced in 
na which suggested considering codes whose coordinates 
can be partitioned into local codes which are (r + p, r) MDS 
codes, for p > 2. In our terms this extended dehnition implies 
that for every p-tuple of coordinates within the 

same local code there is a subset TZ, \R.\ = r such that the 
symbols Xi^,..., Xi^ of every codeword of the code can be 
reconstructed from the restriction of this codeword to the 
coordinates in TZ. A generalization of the bound (|^ for this 
type of codes was obtained in M- Structural properties and 
existence of codes attaining this bound were considered in 
ES\ . while an algebraic construction of codes that attain this 
bound was proposed in ll28l . 


This result is a direct generalization of the bound Q and 
extends the argument in ca from one to many recovering 
sets. 

Turning to lower bounds on the cardinality of codes with 
multiple recovering sets, let us hrst assume that t — 2. 
As observed in ES], W\, ED, a natural way to construct 
codes with two recovering sets arises by using two-level code 
constructions such as product codes or codes on bipartite 
graphs. For instance, consider the case of graph codes, and 
take the example of a code on a bipartite regular graph where 
the edges incident to every vertex form a codeword of the 
[7,4,3] Hamming code H^. Clearly, every coordinate can be 
recovered from a parity check of 4 symbols in two independent 
ways since the dual code has distance 4. In other words, 
we obtain an (n, k, 3, 2) LRC code. It is possible to estimate 
the dimension and distance of the resulting code 0, and we 
obtain a family of asymptotically good 2-LRC codes. Many 
more examples can be constructed using this general approach. 
For greater t, similar constructions can be obtained using codes 
on regular hypergraphs 0. 

To give a simple example of multilevel constructions, 
consider a product code formed of two single parity check 
codes with r message symbols each. The rate of the resulting 
code equals r^/(r -f 1)^, and each symbol has locality r. 
Generalizing, we can construct a t-th power of the binary 
(r -f l,r) single parity check code and obtain a code with 
t disjoint recovering sets that has the rate (r/(r + 1))L 

Dehne 

Ri*\r, i5) = limsup — loga r, Sn) (11) 

n—)-oo 

where Mq*'^ (n, r, d) is the maximum cardinality of the g-ary 
code of length n, distance d, and t recovering sets of size at 
most r for every symbol. In the particular case of t = 1 the 
quantity Rq*\r, J) is the same as the function i?g(r, 6) dehned 
in 0 above. Clearly, Rq^\r,6) < Rq{r,S), so all the upper 
bounds of the previous section apply to the current case. 

From ( [T0| we obtain 

0<5<1. (12) 

Below in ( [TSl l we will obtain a somewhat tighter asymptotic 
bound on R^ . 

Algebraic constructions of LRC codes with t > 2 recovering 
sets were considered in El, 01- Block designs were used in 
OTIl to construct binary t-LRC codes for any r and t, resulting 
in codes of rate R = and minimum distance d = t + 1. 


C. New bounds on LRC codes 

In this section we summarize the main contributions of this 
paper. 


B. Known results. Multiple recovering sets 

The following bound on the distance of an (n, k, r, t) LRC 
code was proved in 


~t{k — 1) -f 1" 
t{r - 1) -I- 1 


Theorem A. Let C be an (n, k, r, t) LRC code with t disjoint 
recovering sets of size r. Then the rate of C satisfies 


k ^ 1 


d<n — k + 2 — 


( 10 ) 


(13) 
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The minimum distance of C is bounded above as follows: 


d < n — 



1 


(14) 


Remarks; 

1. For codes with a single recovering set for every symbol, 
the bound on the rate reduces to ID, which is a tight 
bound ll28l . Currently f = 1 is the only case for which the 
known bounds on the rate have been shown to be tight. For 
two recovering sets the bound ([T^ takes the form 

k ^ 2 r^ 

n - (r + l)(2r + l)’ ^ ’ 


At the same time, the product construction mentioned above 
gives ^ = r^/(r + 1 )^ which is only slightly less than 

2. For codes with a single recovering set for every symbol, the 
bound on the distance ([T^ reduces to ([^, and there exist large 
families of codes that meet this bound with equality ll28l . Il25l . 
ll^ . The next interesting case, in particular for applications, 
is f = 2. From ([l4| we obtain the bound 


d < n — { k — 1 + 


k-l 




(16) 


For some parameters this bound is also tight. For instance, 
consider the shortened binary Hamming code of length 6 with 
the parity-check matrix 


/O 0 0 1 1 l\ 

0 110 0 1 . 

\1 0 1 0 1 0 / 

It is easily seen that this is a ( 6 , 3, 2,2) LRC code, and its 
distance d = 3 meets the bound ([T 6 |) with equality. 


Corollary. The rate of an {n,k,r,t) LRC code satisfies 


k ^ 1 

n ~ L/t+l 


(17) 


For any alphabet size q the following asymptotic bound holds 
true: 

Q<5<1. (18) 

The bound is tighter than the asymptotic version of the 
bound given in for all R. 

We also have 


Rd'){r,S) = 0, - - ^<d<l 

^ 9 


r 

r 1 


<<^(a 0 )< 


r*(r - 1) 

j-t+i _ X ■ 


Proof: The estimate ( [T7] i follows from Lemma [T2| proved 
below in Sect. |ng Estimate ( [T 8 ] l follows immediately from 

(EJ. 

Let us show that ( [T^ is a tighter bound than ( [T2| . We have 

j.t+1 _ j.t 1 _ X) _ 1 

•j-t+i _ X tr + 1 — l)(tr -f 1 ) 


The numerator of the last fraction is negative for all r,t> 2, 
which is easy to see, for instance, by induction on r. This 
proves our claim. 

The result for large 6 is implied by the Plotkin bound. 
The lower bound on (r, 0) follows from the result of 1311 
mentioned above. 


Theorem B. The following asymptotic GV-type bounds for 
LRC codes hold true: 

Rqir, d) > 1 - min | log ((1 + {q - 1 ) 5 )’'+^ 

0<s<i I r H- i 

-f ( 9 -1)(1 - s)’'+^) - dloggsj. (19) 

R^g^ (a d) > ^ - ^mm^ { log, ^ (s) - d log, s}, 

' ( 20 ) 
where (s) is given in (|48])-(|4^ below. In particular, for 
q = 2 


,_n ^ / 


2 = 0 


X (l_s)*(’'+2-i). (21) 


We also have Rq(r, 6) > 0,0 < 6 < {q — l)/q, 


i?,(r, 0 ) = 


Rqir,S)=0, 


r + 1 

r + 2 ^ r-' — 1 


( 22 ) 

(23) 

(24) 


Independently, the bound ( [T9] ) was obtained in 0. 

In Fig. [T] below we show the GV-type bounds for LRC 
codes. In Fig. [TJa) the bound ( fT9l l is plotted together with the 
upper bounds ^^-([^. In Fig. [^b) the bound ( |20| ) is plotted 
together with other lower and upper bounds. 

The question of lower bounds for f > 3 recovering sets 
becomes more difficult because of the complicated nature of 
interaction between the sets. Using graph-theoretic arguments, 
we establish the following asymptotic result. 


Theorem C. For sufficiently large q, there exists a sequence 
of t-LRC codes with locality r > t and rate R,0 < R < 
1 — f/(r -f 1 ) such that the relative distance 5 is determined 
from the following two equations with respect to the unknowns 

(5,7 : 

d(l - f 7 ) = 1- - R (25) 

r -f 1 

- ^h{5y{r + 1 )) 

t r -\-l 

-(57(r+l)/l(;y^^;^) = 0 , (26) 

where l/(r -f 1 ) < 7 < 1 /f, and h = h 2 is the binary entropy 
function (viz. (D). 

In Fig. [TJc) below the bound of this theorem is plotted 
together with the Singleton-type bound of ([Tg. 

Remarks. 1. Corner points: The pair {R,5) = (1 — t/{r -f 
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1 ), 0 ) provides a trivial solution to (|25|l-(|2^, accounting for 
one of the two endpoints of the lower bound. At the same time, 
the pair ( 7 , 5) = (l/(r + 1 ), 1 ) satishes (|25])-(|26|), resulting in 
(i?,5) = ( 0 ,l). 

2. For small values of 6, Equations ( |25l l, ( |26| ) do not have a 
solution for 7 in the segment [l/(r+l), 1/f]. This corresponds 
to S smaller than the value 5* that gives the maximum possible 
rate of i? = 1 — t/{r + 1 ) according to For (5 < (5* 
the best we can claim is the existence of codes of the rate 
R = 1 —f/(r+l), extending the bound by a horizontal tangent 
line. This setting is close to the problem of locally decodable 
codes 1 ^ . where we do not attempt to construct codes 
with some particular distance, focusing instead on the local 
decoding property and high rate. Note that finding maximum 
rate of locally decodable codes currently is an open problem. 

3. The Singleton bound. The Singleton bound on the dis¬ 
tance of codes without the locality constraint is attained by 
Reed-Solomon codes for all values of the distance. Allowing 
the alphabet size to increase with n, we see that this bound is 
also asymptotically tight for n ^ 00 . The same conclusion is 
true for LRC codes with a single recovering set because of the 
construction of 1^ . At the same time, for codes with t > 2 
recovering sets there is a gap between the best known lower 
and upper bounds on Rq*\r,S) for all values of the alphabet 
size. It is not clear at this point, which of the two bounds in 
(|25ll-(|2^ and ( [T8] l is loose, and it is possible that both can be 
improved. 

4. The case of t = 2 recovering sets: The simplest 
unresolved case and in some sense the most interesting one 
for applications that require high availability is the case of 
two recovering sets. Paper gave an explicit construction 
of 2-LRC codes with relative distance 

r + 1 

6>l-R^— 
r — 1 

for any given value of q. We claim that for large alphabets 
the result of Theorem C improves upon this bound. Indeed, 
for a given R the curves in the ( 7 , S) plane dehned in (|25|) 


and ( |2^ intersect for 7 > (see the proof of Lemma 5.9 1 . 
Moreover, by ( |25] l <5 = 5 ( 7 ) is a strictly increasing function 
that takes the value 1 — R^f^ for 7 = l/(r + 1). Hence we 
conclude that Theorem C establishes existence of a sequence 
of 2-LRC codes with higher minimum distance than 


III. An Upper Bound on the Rate oe LRC Codes 
In this section we prove estimate ([T3|) in Theorem A. 


A. The recovery graph 

Assume that coordinate i has t disjoint recovering sets 
TZj, ...TZl, each of size r, where TZ^ C [n]\i. Dehne a directed 
graph G as follows. The set of vertices V = \n] corresponds to 
the set of n coordinates of the LRC code. The ordered pair of 
vertices (i, j) forms a directed edge * —>■ j if j S 72.* for some 
I & [f]. We color the edges of the graph with t distinct colors 
in order to differentiate between the recovering sets of each 
coordinate. More precisely, let Fg : E(G) —> [f] be a coloring 
function of the edges, given by F{{i,j)) = I if j G 72*. Thus, 


the out-degree of each vertex i G V = V{G) is \T^i\ = 
and the edges leaving i are colored in t colors. We call G the 
recovery graph of the code C. 

The following lemma will be used in the proof. 

Lemma 3.1: There exists a subset of vertices [/ C U of 
size at least 

such that for any U' C U, the induced subgraph Gu' on the 
vertices U' has at least one vertex v G IF such that its set of 
outgoing edges {{v,j),j G U')} is missing at least one color. 

Proof: For a given permutation r of the set of vertices 
V = [n], we define the coloring of some of the vertices as 
follows: The color j G [f] is assigned to the vertex v if 

t{v) > T{m) for all m GlZl. (28) 

If this condition is satished for several recovering sets 72j, the 
vertex v is assigned any of the colors j corresponding to these 
sets. Finally, if this condition is not satisfied at all, then the 
vertex v is not colored. 

Let U be the set of colored vertices, and consider one of 
its subsets U' C U. Let Gu’ be the induced subgraph on U'. 
We claim that there exists v G IF such that its set of outgoing 
edges is missing at least one color in Gu'. Assume toward 
a contradiction that every vertex of Gu' has outgoing edges 
of all t colors. Choose a vertex v GU' and construct a walk 
through the vertices of Gw according to the following rule. If 
the path constructed so far ends at some vertex with color j, 
choose one of its outgoing edges also colored in j and leave the 
vertex moving along this edge. By assumption, every vertex 
has outgoing edges of all t colors, so this process, and hence 
this path can be extended indehnitely. Since the graph Gu' 
is hnite, there will be a vertex, call it vi, that is encountered 
twice. The segment of the path that begins at vi and returns 
to it has the form 


Vi^ V2^ ... Vi, 

where vi = vi. For any i = 1, ...,l — 1 the vertex Vi and the 
edge {vi,Vi+i) are colored with the same color. Hence by the 
dehnition of the set U we conclude that T{vi) > T{vi+i) for 
all i = 1 ,..., ^ — 1 , a contradiction. 

In order to show that there exists such a set U of large 
cardinality, we choose the permutation t randomly and uni¬ 
formly among all the n! possibilities and compute the expected 
cardinality of the set U. 

Let Ayj be the event that ( |28l l holds for the vertex v and 
the color j. Since Pr(Ai, j ) does not depend on v, we suppress 
the subscript v, and write 

Pr(uGU)=Pr(U*=iA,). 

Let us compute the probability of the event Note 

that for any set S C [f] the probability of the event that all 
the Aj,j G S occur simultaneously, equals 

PiFjesA,) = 
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(a) Bounds for codes with single recovering set. 


(b) Asymptotic bounds on codes with one and (c) Bounds for t = 3 recovering sets (large q) 
two recovering sets. 


Fig. 1. Asymptotic bounds for LRC codes. 

(a) Binary codes, r = 3. The plot shows the GV-type bound i |19^ and upper bounds (the Singleton bound), jTJ (the Plotkin bound), (the Linear 
Programming bound). 

(b) Asymptotic upper and lower bounds for binary codes with one and two recovering sets (f = 1, 2 r = 3). Plot {i) shows the Singleton-type bound 118). 

( 2 ) ' ' 
The curve marked (ii) is the asymptotic GV-type bound on R\ ’ (r, (5). For reference we also show (Hi) the GV and (w) Plotkin bounds for codes with one 

recovering set (copied from part (a)). Note that R 2 {t^ V^) = ^2 V^) = 0- 

(c) Bounds for LRC codes with r = 6 and t = 3 recovering sets (large q). The plot shows the lower bound - ^26) together with the Singleton-type bound 

{!}■ 


Hence by the inclusion exclusion formula we get 


= ^(-1)^-1 Q P{A, n ... n A,) 


t 




j=l 


t\ 1 


jj jr + 1 


= i_l_^_ 


= 1 - 


UU^^+j-rY 


E(|C/|) = ^ E(X„) 

= ^ Pr(u e U) 


vGV 

= nPr(U*^iy4j) 


= n(l - --r- ^ 

n=i(i + i) 


k)- 


B. Proof of the bound on the rate 

Let U C [n] be the set of v ertices of cardinality as in 
constructed in Lemma |3.l| and let U = [n]\U be its 


complement in [n]. We claim that the value of every coordinate 
i G U can be recovered by accessing the coordinates in U. 
To show this, we construct the following iterative procedure, 
which in each step is applied to the subset U' CU formed of 
the coordinates whose values are still unknown. In the first step 


(29) 


U' = U. By Lemma 3.1 the induced subgraph Gjj' contains a 
vertex v GU' that is missing one color, call it i. This means 
that the i-th recovering set of v is entirely contained in U'. 
Hence one can recover the value of the coordinate v of the 
codeword by knowing the values of the coordinates in IP. In 
the next step use the same argument for the set of coordinates 
C/'\{z;}. In this way all the coordinates in U are recovered 
step by step relying only on the values of the coordinates in 
U. Therefore, 


k<\U\< 


n 


where ( |29l l follows from lfT3l p. 188]. Now let Xy be the 
indicator random variable for the event that v G U, then 


n-=i(i + ^) 


proving inequality (El- 

To get a clearer impression of ([T3]l, observe that 


logH (l + —) = ~ XI ^ ~ -logL 

\ 7 r/ ^^ \ jr/ J jr r 






The proof is completed by observing that there exists at least 
one choice of r for which \U\ > E(|t7|). ■ 


Therefore, the value of the product in ( [T3l l is about More 
precisely, let us show that ( [T3] l implies the bound ( [T7] i. 
Lemma 3.2: 


i=i 
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Proof: For i = 0, r — 1 define the quantity 

t 


/‘=n( 


1 


i=i 


1 

i + jr 


r/\ (t + l)r 
t 


It can be easily seen that for any i, 

/i < /o ^ /i(l 

= + ‘ 

Furthermore 


{t + l)r + 1 / ' 


r—1 r—1 t -j 

n^=nn<i+fT7t> 


2 = 0 


(i + l)r-l 


= n ( 1 + 7 ) 


j^r 

= t 1. 


Using the inequalities ([30ll in pT]), we obtain 


VfTT = 


n/. 




IlA 


7 = 1 


< 


-- 1 


. ni*(i+ 

> 2=0 


— \^t + 1(1 + 


t 


{t + l)r ■ 
t 


T> 


(f + l)r + 1' 
< \/t + 1(1 H—). 


(30) 


(31) 


IV. Upper Bounds on the Minimum Distance oe LRC 

Codes 

In this Section we prove the bound ( [l4| l of Theorem A. Our 
approach extends the idea of ifTTl used to prove the bound ([^. 
We begin with a simple observation. Suppose we are given 
an LRC code C with the parameters {n,k,r,t) over a g-ary 
alphabet (field or not). Let I C [n] be a subset of coordinates, 
and let xj be a restriction to / of a codeword x G C. Recall 
our notation C/ := {xj : x G C}. Observe that if I is such 
that \Ci\ < q^, then the distance of the code C satisfies the 
inequality 

d{C)<n-\I\. (32) 

The main idea behind the proof of ( [l4| l is to construct a set S 
of size k — 1 such that the values of codeword coordinates in 
S determine the values of the coordinates in some large subset 
S'. Since |CsuS'| < q^~^, we then can apply ( [32l i to derive a 
bound on d{C). 


A. Proof of the bound d 

Consider the recovery graph G of an (n, k, r, t) LRC code 
C with t recovering sets, defined in Sect. III-A| Consider the 
following coloring procedure of the vertice^ Start with an 
arbitrary subset of vertices S C V and color it in some 
fixed color. Now let us color some of the remaining uncolored 
vertices according to the following rule. A vertex is colored if 
at least one of its recovering sets is completely colored. This 
process continues until no more vertices can be colored (recall 
that G is finite). We denote the set of colored vertices obtained 
at this point by Cl(S') and call it the closure of S in G. Call 
the quantity | C1(S')|/|S'| the expansion ratio of the set S. It 
is clear that a large expansion ratio means that the values of a 
large number of coordinates outside S are determined by the 
values of the coordinates in S. We shall show that there is a 
subset with a large closure and use ( |32| ) to bound the code’s 
distance. 

We begin with two lemmas. 

Lemma 4.1: Let G be the recovery graph of an {n,k,r,t) 
LRC code C. For any vertex v G G there exists a set S of size 
at most r* such that v G C^^”), and the expansion ratio of S 
is at least 

j,t+i _ ^ 


et = 




(33) 


Proof: We use induction on t. For t = Q there are no edges 
in the graph. Define S = {?;} and note that 01(5) = S = {u}, 
and the expansion ratio is 1 as needed. Now assume that the 
claim is correct for t recovering sets. Let us prove it for f + 1 
recovering sets. Remove from G a vertex v. For each other 
vertex it ^ u we remove the edges that correspond to one of 
its recovering sets. Specifically, if u has a recovering set that 
contains v, we remove all of its edges that correspond to this 
recovering set; otherwise, remove the edges that correspond 
to any one of its recovering sets. Denote the resulting graph 
by Gi, and observe that each vertex of Gi has exactly t 
recovering sets. We will denote by Cli(-) the result of the 
closure operation in Gi and use a similar notation for other 
graphs in the proof. 

Let vi,...,vi be the vertices of one of the recovering sets 
of V, where I < r. Our plan is to apply the induction 
hypothesis successively for each of the I vertices, where in 
the i-th step we construct a subset of vertices Si C U(Gi) 
such that Vi G Cli(5'i U ... U Si). Suppose that the subsets 
Si,..., Si -1 are already constructed. Color the vertices in 
Cli(S'i U ... U Si-i) and let Gi be the induced subgraph of 
Gi on the non-colored vertices of Gi, i.e., the set of vertices 
U(G,) = U(Gi)\Cli(5iU...U5,_i). 

Let us describe the construction of the set Si. If Vi G 
Cli(S'i U ... U Si-i), put Si = 0. Otherwise Vi G V(Gi). 
Note that each vertex u in Gi has outgoing edges of all t 
colors because otherwise, if u is missing one color, it has a 
recovering set that is contained in Cli(S'i U ... U Si-i), and 
then also u G 011(51 U ... U Si-i). Hence Gi can be viewed 
as a recovery graph of a code with t recovering sets for each 


* This coloring uses just one color (a vertex is colored or not) and is 
different from the £-coloring of the edges introduced in the beginning of 
Sect. 1^ Both colorings will be used in the proof 





















coordinate. Apply the induction hypothesis to Gi to hnd a set 
Si of size at most r* and expansion ratio at least et such that 
Vi G 01^(5^), where Cli(-) is the closure in Gi. Notice that 
since Cli(S'i) is a subset of the vertices of the graph Gi, it is 
disjoint from the set Cli(S'i U ... U Furthermore, it is 

easy to see that 

Cli(5i U ... U S,.i U S,) = Cli(Cli(^i u... u s,.i) u s,) 
= Cli(5iU...U5,_i)UCl,(5,) 

i 

= \JC\,iS,), (34) 

i=i 

where the union is in fact a disjoint union. We claim that the 
set S = u\^^Si satishes the properties in the statement of the 
lemma. 

We need to show that v G Cl(S'). First let us show that for 
any i = the vertex Vi G Cli(S). Indeed, by construc¬ 

tion, if Si is the empty set, then Vi G Cli(S'i U ... U Si-i), 
otherwise Vi G Cli(Si). We conclude that Cli(S') contains a 
complete recovering set ui,..., w; of the vertex v, and therefore. 


ci(^) = cii(5)uM, 


(35) 


where Cl( ) is the closure operation in the original graph G 
(recall that G contains only one vertex more than Gi, the 
vertex v). The size of S satishes 

i 

\S\ = \VjUS.\=Y^\S,\<r-r^ = T^+\ 

Let us estimate the expansion ratio. By ( [34| > and ( (35| ) 

i 

I C\iS)\ = I uLi Ch{S^) u {u}| = 1 + y] I Cl,(5.)|. 

Hence the expansion ratio of the set 5 satisfies 
|C1(^)| _ 1 + ELi|C1.(5,)| 


1^1 


-pi-t-l 


1^1 

E\=i\cus,)\ 

1^1 

15,11 Cl,(5,) I 








E 

2 = 1 

i=l 

et 


1^1 l^.l 

S^ 

1^1 


— et+i, 


(36) 


(37) 


where follows since each set Si has expansion ratio at 
least et in Gi. ■ 

Lemma 4.2: Let m be an integer whose base-r representa¬ 
tion is 

m = OfiT*, 
i 

then for an integer t. 


Lr 




i=0 


i=0 
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where the quantity is dehned in ( |3?| l. 

Proof: Note that if m = r-ary repre¬ 

sentation of m then 


= E 


air 


j>i 


and recall that 


et = 


^t+i _ i 
-•i+l _ rpt 


= E’ 
2 = 0 


Then we have that 

t 


Elw^*j =EE 


ao-r-" 


2 = 0 2 = 0 j>i 

= E“j^' E 

_ 7>0 2—0 

t-1 

= E + E 

j=0 j>t 

t-1 

— OLjr^Cj -f e^r* \m/r ^\, 


and the result follows. ■ 

Remark: The coloring process of the vertices of G used to 
construct the closure of the subset is an instance of a large 
class of models of inhuence propagation in networks. Similar 
models were studied in the literature in a number of contexts 
related to random and deterministic graphs. We point to a 
recent paper a which studies the minimum size of the subset 
S of vertices of a regular expander whose closure under a 
threshold decision rule equals the entire set of vertices V. 
This paper also contains pointers to the literature on related 
problems. 


Proof of the upper bound on the distance (|14|i.' Let G 


be the recovery graph of the code. We will use Lemma 4.1 


for the graph G several times. Assume that we are allowed to 
color k—1 vertices and would like to color them in a way that 
guarantees a large expansion ratio with respect to their closure. 
Let m < f be the largest integer such that r™ < fc — 1, then 
according to Lemma 4.1 the graph Hi := G contains a subset 
Si of vertices of size at most r™ whose expansion ratio is 
at least Cm- Color the vertices in Cl(S'i). Then denote by H 2 
the subgraph induced on the subset of vertices 14\ Cl(S'i) and 
apply Lemma 4.1 to H 2 , etc. Continuing this process, suppose 
that in the Lth round there are bi vertices still to be colored 
(out of a total budget of fc — 1 vertices), and let Hi be the 
induced subgraph of G on the set of vertices that have not 
been colored in the previous i — 1 rounds. Each vertex in Hi 
has outgoing edges of all t colors because if not, then one 
of its recovering sets has been already removed, but then this 
vertex itself cannot be present by dehnition of the closure. Let 
m < f be the largest integer such that < bi. Now apply 
Lemma 4.1 for the graph Hi to hnd a set Si of vertices of 
size at most 

\S,\<r^ (38) 
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and expansion ratio at least Cm and color it. Notice that the 
expansion ratio Cm is an increasing function of m, hence in 
order to get large expansion we would like to choose the 
largest possible set Si under the budget constraint. Continue 
this procedure until we have used all the k — 1 vertices, and 
call the obtained set of fc — 1 vertices S. Let 

fc - 1 = ^ 

i 

be the r-ary representation of A: — 1. By ( |38] ) the sets Si in 
the first steps of the procedure have expansion ratio at 

least Ct, while the remaining set sets Si could have a smaller 
expansion. Hence the expansion of the set S = VJiSi is at least 


i-l 


C1(S)|> [—+ 
Using Lemma [4^ write ( [39l ) as 


OLiT ei. 


(39) 


i=0 


By construction, [S'! = fc— 1 and so |Cs| < Clearly also 
|Cci(S)| < therefore to finish the proof of ([T^ we take 

/ = C\iS) in 


weight enumerator of a code of length n is defined as the 
polynomial 

71 

^(s) = 'y . (41) 

Li;—0 

where is the number of codewords of Hamming weight 
w. The code with generator matrix Hq contains q—1 collinear 
vectors of weight r + 1 and the zero vector, and therefore has 
the weight enumerator l + Using the Mac Williams 

theorem lEl p. 146] we obtain 

bis) = ^ ((1 + (g - + (g - 1){1 - s)-+l). (42) 

q 

Theorem 5.1: (Gilbert-Varshamov bound for LRC codes = 
Theorem B, Eq. ( [T9] )) 

Rqir, S) > ^ - ^mm^ { ^ log, &(s) - <5 log, s}. (43) 

Proof: The code given by the null space of Hjj is a 
direct sum of n/{r + 1) single parity check codes, so its 
weight enumerator equals Let B.^] ■= |{a^ G IFg : 

wt(a;) = w, Hux^ = 0}|, then it is clear that 

b(s) 

< min -. (44) 

“ 0<s<l s'" 


V. Lower GV-type bounds for LRC codes 

Here we prove lower asymptotic bounds on the parameters 
of LRC codes with one and two recovering sets. The bounds 
are obtained by studying ensembles of random linear codes 
with locality properties and rely on a variation of Gallager’s 
method, previously employed for LDPC codes ifTOll and later 
for bipartite-graph codes 0 . 


Note that the value s = 1 corresponds to the trivial estimate 
Bw < Bw = 9 ^. See also the remark after this proof 
in regards to the optimization region of s. 

Let us turn to the matrix Hl. The number of rows of Hl 
equals n — k — n/(r 4- 1) = n(r/(r -f 1) — R), and thus for 
any nonzero vector a; G 

Pr(i?Lx'^ = 0 ) = q-Ar/{r+l)-R)_ ^45^ 


A. One recovering set, any alphabet 

In this section we prove the lower asymptotic bound on 
Rq{r, S) stated above in Theorem B, Eq. 

Let C be a linear (n, k, r) LRC code over F,. We will use 
the fact that every code symbol is involved in at least one 
low-weight parity check of at most r symbols. Suppose that 
n is divisible by r + 1. Consider an [n — k) x n matrix over 
'Hu] 


Hl 


where Hu is a block-diagonal 


F, of the form H = 

matrix and Hl has no special structure. Assume that Hu has 
the form 


Hu 



(40) 


where Hq is the parity-check matrix of an [r -f l,r] single 
parity check code (i.e., a row of r -f 1 ones), and the blank 
spots are filled with zeros. Construct an ensemble of matrices 
Hq{n, k, r) = {H} by choosing the elements of Hl uniformly 
and independently at random from F,. 

We will need the expression for the weight enumerator b{s) 
of the code with the parity-check matrix Hq. Recall that the 


Using the union bound and the fact that the right-hand side of 
(jg grows on w for 0 < s < 1 , we obtain the estimate 

Pr({3a; G F” : Hx'’' = 0, 0 < wt(x) < i5n}) 

< min " . (46) 

“ 0<s<l 5-5" 

Thus to prove that the ensemble contains codes with distance 
> 6n it suffices to show that the right-hand side of ( |46| l is 
strictly less than one. 

Let us compute the logarithm on the right-hand side of ( |46| l. 
We obtain 


log„ (Snq min 

^ V 0<s<l 



r 

r + 1 


-f i? -f min 

0<s<l 


{^r^log, bis) - (5log, s| -4 0 ( 1 )). 


(47) 


Choosing R such that for sufficiently large n this quantity 
becomes negative ensures that the probability Pr({3a; G F^ : 
Hx^ = 0, 0 < wt(a;) < i5n}) —)• 0 as n —)■ oo. ■ 


Remarks: 1. It may seem that we are unnecessarily restrict¬ 
ing the optimization region in the proof to s G (0,1] and 
that by allowing all s > 0 we may be able to tighten the 
resulting bound. In the next lemma (proved in the Appendix) 
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we show that this is not the case, and no loss ensues from this Lemma 5.5: The code with the parity-check matrix Hq has 
restriction. dimension (r -|- 1)(^^ — 1) and the weight enumerator 


Lemma 5.2: Let b{s) be the function defined in ( |42l i, and let 
1 < d, r < n be positive integers. For s > 0 there is a unique 
minimum ming that is attained for 0 < s < 1. 

2. From the above proof it is possible to obtain a finite- 
length lower bound on LRC codes. The following proposition 
follows from ( |46l l. 

Proposition 5.3: Let n,k,r be positive integers such that 
(r + l)|n and r < k < rrijir -f 1). If a positive integer d < n 
satishes the inequality 


dq 


_r ifc . 6(S)’' + 1 

I-+ 1 + mm -^— < 1, 

0<s<l 


there exists a q-ary {n, k, r) linear LRC code with distance d. 

This bound is generally better than the direct adaptation of 
the GV argument given in (|^. 

Corollary 5.4: (= Theorem B, Eq. ( |22l l) For any fixed r, 
Rq{r,0) = r/{r + 1) and Rq{r,5) = 0 if and only if 5 > 
(9- l)/9- 

Proof: From (|^ we obtain that i?q(r, 0) < r/(r-|-l) while 
Proposition |2.2| implies the reverse inequality. This proves the 
hrst statement. 

Next we claim that Rq{r, (5) > 0 for all 0 < (5 < {q-l)/q. 
This follows because the right-hand side of ( |4^ is positive 
for all S < {q — l)/q. The last claim follows from the proof 
of Lemma |5.2| in the Appendix; see in particular the remarks 
after the end of the proof. 

At the same time, the Plotkin bound (even without the local¬ 
ity constraint) implies that Rq{r, 6) = 0 for any 6>{q-l)/q. 
This completes the proof. ■ 


B. Two recovering sets 


Consider LRC codes with two disjoint recovering sets for 
every symbol. Construct an ensemble of parity-check matrices 
as follows. Assume that n is a multiple of namely, 

n = m{r + l){r + 2)/2. Let H be the matrix of the form H = 


Hu 

Hl 


, where the submatrices Hu and are of dimensions 


m{r -f 1) X n and {n — k — m{r + 1)) x n, respectively. The 
matrix Hu in ( |40l i is again block-diagonal, but this time the 
matrix Hq is of dimensions (r -f 1) x and is the edge- 

vertex incidence matrix of a complete graph Kr +2 with one 
row deleted (deleting the row ensures that the remaining rows 
are linearly independent). The elements of the bottom matrix 
Hu are chosen uniformly and independently at random from 
the field Fg. The number of rows of the matrix Hu now equals 


(1 - R)n - m{r -f 1) = 



n. 



+ (48) 

where 


E = . .,iq-i) 


q even 
q odd. 


(49) 


In particular, g^^\s) is given in Eq. ( |2T] l. 

Proof: Even though the formula for g)^ {s) can be 
obtained from (|48]l-(|49]l, we give an independent proof. On the 
one hand, the case q = 2 is arguably the most interesting; on 
the other, it makes it easier to understand the general argument. 
Below by Hq we denote the full edge-vertex incidence matrix 
of the graph iCr+ 2 , before one row is deleted from it to obtain 
Hq. Its dimensions are {r + 2) x and rk(iFo) = r -f 1. 

Case q = 2. Consider the code spanned by the rows 

of Hq over F 2 . Let Vi be a subset of vertices of Kr +2 
and consider the codeword x given by a sum of rows that 
correspond to the vertices in Vi. Each coordinate corresponds 
to an edge, and any edge with both ends in Vi or both ends in 
Vf accounts for a zero coordinate of x. Moreover, the nonzero 
coordinates are precisely those edges with one end in Vi and 
the other in Vf. Thus, the weight enumerator of the code 
equals 

i=0 ^ ^ 


The factor 1/2 accounts for the fact that rk(7To) = r + 1, so 
the above procedure counts every code vector twice, once for 
the subset Vi and the second time for Vf. 

The expression for g^^\s) now follows on applying the 
Mac Williams theorem ifT^ p. 146]. 

Arbitrary q. Let Fq = {ao = 0, ai,..., where for 

q odd the numbering of the elements of the field is such that 

CXi CXq—i. ^ 

Consider the code spanned by the rows of Hq (or of 
Hq) over F,. Let a: be a codeword in and assume that x 
is a linear combination of the rows of Hq with coefficients 
oi, 02 ,..., 0 ^+ 2 - The coordinates of x correspond to the 
edges, and Xj = 0 if and only if the ends of the jth edge add 
to zero, i.e., if = 0, where ji,j 2 and the vertices 

connected by edge j. Let 


E '■= |{^ e {1,2 ,... ,r-f 2} : a/ = aj)\, j = 0,1,..., g - 1 


be the composition of the coefficient vector. If q is even, then 
the coordinate Xj = 0 if and only if . If g is odd, then 

Xj = 0 if and only if either = 0 or 0 Oj^ = —aj^. 

Suppose that the composition (zq, ii,..., iq-i) is hxed. Then 
for even g the number of nonzero coordinates Xj is given by 







II 


the first of the two expressions for E{io, ii,..., iq-i), while 
for odd q it is given by the second expression. 

Recalling that rk(i/o) = r + 1, we see that every codeword 
was counted q times. Therefore, the weight enumerator of the 
code C'*' equals 

ij=r+2 

The proof is hnished by the application of the MacWilliams 
theorem. ■ 


r + 2 

^0 j ■ j tg—1 


..E 


The following theorem is proved by computing the expected 
distance of the code in the ensemble given by the matrices H. 

Theorem 5.6: (Gilbert-Varshamov bound for 2-LRC codes 
= Theorem B, Eqns.((20li and ( (24l i) 




-mm • 

- 2 o<s<i 


TFjpK 9f\s)-5\og^s 


]■ 


(50) 

In particular, (r, 0) > and R^q'^ (r, (5) = 0 if and only 
if (5 > 2 = 1 . 

— 9 

Proof: First let us show that every coordinate of the null 
space of H has two disjoint recovering sets of size r + 1. 
Consider a subset of coordinates of size that corresponds 
to one instance of Hq. Every edge in the graph Kr +2 is 
connected to two vertices, and the rows of Hq that contain 
these vertices, contain two sets of ones that intersect only 
on the chosen edge. These sets form the recovering sets of 
the chosen edge, and they are obviously disjoint (because the 
graph does not contain multiple edges). 

The remaining part of the proof is computational. It follows 
the steps (|4T]i-(|47]i and is completely analogous to the proof 
of Theorem HU and Corollary |5.4| ■ 


By Hall’s matching theorem, such a family contains a system 
of distinct representatives, i.e., a set of |5| distinct elements 
Oi G [m] such that S Ai for each i. 

Lemma 5.7: Let 5 be a family of subsets of an m-set that 
satishes Hall’s condition. For a sufficiently large q the vectors 
{xa,A G 5} of the set dehned by S are linearly independent 
with positive probability which tends to one as q ^ oo. 

Proof: Let |iS| = s and consider the m x s matrix M 
whose columns are the vectors {xa^A S 5}. Note that ( |53l ) 
implies that s < \ U. 4 gs A\ < m. We will show that with 
high probability the rank of the matrix M is s. Since S 
contains a system of distinct representatives, there exists an 
injective mapping / from the columns of M to its set of 
rows such that the random variable Xf(^A).A is not identically 
zero for any A S 5. Let M' be the s x s submatrix of M 
restricted to the rows f{A),A G S. The determinant of this 
matrix is a homogeneous polynomial of total degree s that 
is not identically zero: for instance, it contains the nonzero 
term JIagS ^fiA),A- By the Schwartz-Zippel Lemma ll24l we 
have that P(detM' = 0) < s/|F*|, where the probability is 
computed with respect to the random choice in ( |5^ . In other 
words, 

P(rkM = s) > P{ik{M') = s) = 1-^—. 

q-1 

■ 

Let G = {V = Vi U V 2 , £') be a biregular bipartite graph 
with deg u = f for u G Vi and deg u = r + 1 for v G V 2 . 
Suppose that \Vi\ = n, then the number of vertices in V 2 is 
p = nt/{r + 1). The graph G is called an {t,r + 1,<5, 7 )- 
expander if every subset T C Vi,\T\ < Sn has at least 7 <|T| 
neighbors in V 2 , where 0 < <5 ,7 < 1. The following result, 
due to Q, is cited here in the form given in ll23l p. 431]. 


C. Multiple recovering sets. Large alphabets. Proof of Theo¬ 
rem C 


In this section we show the existence of an (n, fc, r, f) 
LRC codes over a sufficiently large hnite held Fg with large 
minimum distance and rate 


i? < 1 - 


t 

r + 1 


(51) 


Lemma 5.8: Let G be a graph chosen uniformly from the 
ensemble of (f, r + l)-regular bipartite graphs and let n — 00 . 
For a given 7 G 1 — ^) let <5 be the positive solution of 
the equation 

Then for 0 < <5' < <5 and /3 = f(l — 7 ) — 1 


The proof relies on the existence of regular bipartite graphs 
with good expansion properties. 

For a subset A C [m] construct a vector xa = 
{xi^A,---,Xm,A) G F™ as follows: 


^i.,A 


0 if A 
^i,A if i ^ A 


(52) 


where Xi^a are independent random variables uniformly dis¬ 
tributed over F* = Fg\{0}. 

Recall that a family of subsets of an m-set S C = 

{All,..., a 1 | 5 |} is said to satisfy Hall’s condition if for any 
subfamily S' C S, 


(53) 


Pr({G is an(f,r -I- 1,d', 7 ) expander}) > 1 —0(n ^). (55) 

Note that the conditions on 7 in ll23l are stated as 0 < 7 < 
1 — 1/t, but the last entropy function in ( |54l i is not dehned 
for 7 < l/(r -I- 1). However, for such 7 any subset of vertices 
T C Vi has at least f|T|/(r -I- 1) > "ft\T\ distinct neighbors 
in V 2 and therefore any (f, r -G l)-regular bipartite graph is an 
(t, r -I- 1 , 1 , ■j)-expander. 

Given a bipartite biregular graph G, dehne a family of 
subsets S of the set [n—k] as follows. Assume that the vertices 
in Vi (in V 2 ) are numbered from 1 to n (from 1 to p). For 
every vertex i G Vi form a subset Si = NiU [p l,n — k], 
where Ni C V2 is the set of neighbors of i in V2. Note that 
on account of ([5T]l each Si is indeed a subset of [n — k]. 


|5'| < |Uag5'41|. 
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Lemma 5.9: Let G be an {t,r + l, 6 , 7) expander, where the 
variables 5, 7 are the unique solution of ( |54| and the equation 

<5(1 - f7) = 1 - - i?, (56) 

r + 1 

in the range 7 G j), where i? < 1 — t/(r + 1). Consider 
the sets Si defined above (before the statement of the lemma). 
Then any family of subsets S = {Si : i G /} of size |5| < Sn 
satisfies Hall’s condition. 

Proof: The proof is formed of two steps. First we show 
that the {t, r + 1,S, 7) expander graph G is well defined, i.e., 
that there is a pair of numbers [6, 7) that satisfies ( |54| and ( [56l l. 
Each of these equations defines 5 as a continuous function of 
7. Let ^1(7) be the function defined by ( |5^ and let (52(7) be 
defined by (|^i (as argued in ll2^ . 62 is well defined in the 
sense that Eq. has a unique positive root 5). The function 
(5 i( 7) increases monotonically from a number less than 1 to 
+00 as 7 ranges from to 1/t. At the same time, the value 
<52(1/(?' + 1)) is determined by the equation 

Since t > 2 and < 1, we conclude that h{5) = 0, and 
since (5^0, this implies that (52(l/(r + l)) = 1. Erom Lemma 


5.8 and since t > 2, for any 7 G j) there exists an 
expander graph for all S' < 52(7)- This implies that (52(7) 
is a bounded function for 7 in this range; in fact, it is easy 
to check that 52(7) is a monotonically decreasing function. 
Therefore, there exists exactly one one 7 G [1^7,1) such that 

<5 i (7) = ^2(7)- 

Let us prove the claim about Hall’s condition. Let I C [n] 
be a subset of indices of size S'n < Sn. 

I Uig/ S'il = I Uig/ NiU[p+l,n- k]\ 

= I Uig/ Nil + n - k - p 
> tyS'n + n — k — p 


= n{tjS' + 1 — R — 
> S'n, 


-) 


where the first inequality follows from the expansion property 
of G and the second inequality follows from ( |5^ and the fact 
that S' < 5. ■ 

Now we are ready to complete our argument. 

Proof: (of Theorem C) Let G be the graph in Lemma 
|5.9| and let the corresponding family of subsets be 5 = 
{Si,..., Sn}- On account of Lemma 5.7 there exists a set of 
n vectors of length n — k such that any Sn of them are linearly 
independent. Let these vectors form the parity check matrix 
H of a code C. Then it is clear that the minimum distance 
of that code is at least Sn. Moreover, the first p rows of the 
matrix H are of weight r + 1 and they provide the locality 
property for the code’s symbols. 

Observe that the recovering sets defined by this construction 
are not necessarily disjoint but become such if the graph con¬ 
tains no cycles of length 4. As shown in ifTbl . the probability 
that a random regular graph on n vertices has no cycles of 
length 4 is bounded away from zero as n —>■ 00. As argued 
in the last section na, the methods of that paper apply to 


bipartite graphs, leading to a similar conclusion. At the same 
time. Lemma 5.8 implies that the probability for a random 
graph to have the claimed expanding properties approaches 
one. Together these results imply that there exist (f, r +1, (5,7) 
biregular bipartite expanding graphs with no cycles of length 
4, i.e., that there exist (n, k, r, f) LRC codes with the stated 
parameters. This concludes the proof ■ 


VI. Concluding remarks 

The problem of bounding the cardinality of LRC codes with 
a given distance poses a number of interesting challenges even 
for a single recovering set. While the asymptotic version of 
this problem is presently in the same state as the asymptotic 
problem of bounding the size of error correcting codes without 
the locality constraint, for finite parameters the only meaning¬ 
ful bound that accounts for the size of the alphabet is the 
shortening bound of i). We believe that the asymptotic GV- 
type bound is in a certain sense “in a final form,” i.e., this 
bound cannot be improved by studying ensembles of random 
codes without bringing in significant new ideas. At the same 
time, the field for the upper bounds seems to be open in the 
sense that it should be possible to find bounds that improve on 
the currently known results. In particular, since the structure 
of LRC codes shows some similarity to LDPC codes, it is 
likely that methods of deriving upper bounds on LDPC codes 
could yield good upper bounds on LRC codes. We note that 
straightforward application of techniques developed for LDPC 
codes, e.g., il, does not lead to improved upper bounds in the 
LRC case. 

Eor codes with multiple recovering sets it is difficult to 
derive good lower or upper bounds because there is little 
control over the structure of the sets. Nevertheless, we believe 
that the GV-type bound for t = 2 derived in this paper will be 
difficult to improve by studying ensembles of random codes. 
At the same time, it could be possible to use constructions on 
algebraic curves to obtain improvements of the GV bound for 
t >2. Eor the case t = \ such improvements were obtained 
in the recent work 0. 

Einally, and interesting open question is to establish (or 
disprove) the tightness of the Singleton-like bound ( [TSl l for 
multiple recovering sets in the case of large alphabets. A 
related research direction is derandomizing the expander graph 
lower bound derived in this paper. 


Appendix 

Proof oe LEMMA f5.2l Consider the function 

F(s) = —^— ln5(s) — (51ns. 
r -f 1 

The lemma will be proved if we show that for every S G 
(0, (g — l)/g], F(s) has a unique minimum attained for 0 < 
s < 1. Setting F'(s) = 0, we obtain the equation 

/(s) = 5, (57) 


where 


/(s) 


1 sb'(s) 
r + 1 b(s) 
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It is easy to check that ( |57l l has the solutions (s = 0,5 = 0) 
and (s = 1,5 = (g — l)/g). Suppose we prove that /(s) 
is strictly monotone increasing for s > 0. This would imply 
that the inverse function s = is also strictly monotone 

increasing on 5, and therefore, the minimizing value /~^(5) 
for all 0 < 5 < (g — l)/g is unique and is located in the open 
interval (0,1). 

It remains to prove that /(s) is indeed a strictly increasing 
function of s > 0. We have 

fTsi = 1 isb'is)ybis)-s{b'{s)r 

r + 1 5(s)2 

Recalling that b{s) is a polynomial of s of degree r + 1, let 
us write it as 

r+l 

b{s) = 1 + ^ bis\ 

i=2 

Next 

s(s5'(s))'5(s) — (s6'(s))^ 

r+l r+l r+l 2 

2—2 2—2 2—2 

The right-hand side of this equality is positive for all s > 0 if 

r+l r+l r+l 2 

> 0 . 

2 — 2 2—2 2—2 

But this last claim is simply the Cauchy-Schwartz inequality 
for the real vectors (tv++)i =2 (v+^)i= 2 ' The lemma 

is proved. 

Note that since the minimizing value sq is less than 1 for 
all 5 < (g — 1)/g, and since the value of the minimum in ( [T^ 
increases from 0 to r/(r -f 1) as sq ranges from 0 to 1, the 
right-hand side of ( [T9| l is positive for all 5 G (0, (g — l)/g). 
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